Pesquisa | Portal Regional da BVS

1.

Hybrid-hybrid correction of errors in long reads with HERO.

Kang, Xiongbin; Xu, Jialu; Luo, Xiao; Schönhuth, Alexander.

Genome Biol ; 24(1): 275, 2023 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-38041098

RESUMO

Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Benchmarking

2.

pan-MHC and cross-Species Prediction of T Cell Receptor-Antigen Binding.

Han, Yi; Yang, Yuqiu; Tian, Yanhua; Fattah, Farjana J; von Itzstein, Mitchell S; Hu, Yifei; Zhang, Minying; Kang, Xiongbin; Yang, Donghan M; Liu, Jialiang; Xue, Yaming; Liang, Chaoying; Raman, Indu; Zhu, Chengsong; Xiao, Olivia; Dowell, Jonathan E; Homsi, Jade; Rashdan, Sawsan; Yang, Shengjie; Gwin, Mary E; Hsiehchen, David; Gloria-McCutchen, Yvonne; Pan, Ke; Wu, Fangjiang; Gibbons, Don; Wang, Xinlei; Yee, Cassian; Huang, Junzhou; Reuben, Alexandre; Cheng, Chao; Zhang, Jianjun; Gerber, David E; Wang, Tao.

bioRxiv ; 2023 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-38105939

RESUMO

Profiling the binding of T cell receptors (TCRs) of T cells to antigenic peptides presented by MHC proteins is one of the most important unsolved problems in modern immunology. Experimental methods to probe TCR-antigen interactions are slow, labor-intensive, costly, and yield moderate throughput. To address this problem, we developed pMTnet-omni, an Artificial Intelligence (AI) system based on hybrid protein sequence and structure information, to predict the pairing of TCRs of αß T cells with peptide-MHC complexes (pMHCs). pMTnet-omni is capable of handling peptides presented by both class I and II pMHCs, and capable of handling both human and mouse TCR-pMHC pairs, through information sharing enabled this hybrid design. pMTnet-omni achieves a high overall Area Under the Curve of Receiver Operator Characteristics (AUROC) of 0.888, which surpasses competing tools by a large margin. We showed that pMTnet-omni can distinguish binding affinity of TCRs with similar sequences. Across a range of datasets from various biological contexts, pMTnet-omni characterized the longitudinal evolution and spatial heterogeneity of TCR-pMHC interactions and their functional impact. We successfully developed a biomarker based on pMTnet-omni for predicting immune-related adverse events of immune checkpoint inhibitor (ICI) treatment in a cohort of 57 ICI-treated patients. pMTnet-omni represents a major advance towards developing a clinically usable AI system for TCR-pMHC pairing prediction that can aid the design and implementation of TCR-based immunotherapeutics.

3.

VeChat: correcting errors in long reads using variation graphs.

Luo, Xiao; Kang, Xiongbin; Schönhuth, Alexander.

Nat Commun ; 13(1): 6657, 2022 11 04.

Artigo em Inglês | MEDLINE | ID: mdl-36333324

RESUMO

Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .

Assuntos

Algoritmos , Nanoporos , Análise de Sequência de DNA/métodos , Haplótipos , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Software

4.

StrainXpress: strain aware metagenome assembly from short reads.

Kang, Xiongbin; Luo, Xiao; Schönhuth, Alexander.

Nucleic Acids Res ; 50(17): e101, 2022 09 23.

Artigo em Inglês | MEDLINE | ID: mdl-35776122

RESUMO

Next-generation sequencing-based metagenomics has enabled to identify microorganisms in characteristic habitats without the need for lengthy cultivation. Importantly, clinically relevant phenomena such as resistance to medication, virulence or interactions with the environment can vary already within species. Therefore, a major current challenge is to reconstruct individual genomes from the sequencing reads at the level of strains, and not just the level of species. However, strains of one species can differ only by minor amounts of variants, which makes it difficult to distinguish them. Despite considerable recent progress, related approaches have remained fragmentary so far. Here, we present StrainXpress, as a comprehensive solution to the problem of strain aware metagenome assembly from next-generation sequencing reads. In experiments, StrainXpress reconstructs strain-specific genomes from metagenomes that involve up to >1000 strains and proves to successfully deal with poorly covered strains. The amount of reconstructed strain-specific sequence exceeds that of the current state-of-the-art approaches by on average 26.75% across all data sets (first quartile: 18.51%, median: 26.60%, third quartile: 35.05%).

Assuntos

Metagenoma , Metagenômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA

5.

Enhancing Long-Read-Based Strain-Aware Metagenome Assembly.

Luo, Xiao; Kang, Xiongbin; Schönhuth, Alexander.

Front Genet ; 13: 868280, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35646097

RESUMO

Microbial communities are usually highly diverse and often involve multiple strains from the participating species due to the rapid evolution of microorganisms. In such a complex microecosystem, different strains may show different biological functions. While reconstruction of individual genomes at the strain level is vital for accurately deciphering the composition of microbial communities, the problem has largely remained unresolved so far. Next-generation sequencing has been routinely used in metagenome assembly but there have been struggles to generate strain-specific genome sequences due to the short-read length. This explains why long-read sequencing technologies have recently provided unprecedented opportunities to carry out haplotype- or strain-resolved genome assembly. Here, we propose MetaBooster and MetaBooster-HiFi, as two pipelines for strain-aware metagenome assembly from PacBio CLR and Oxford Nanopore long-read sequencing data. Benchmarking experiments on both simulated and real sequencing data demonstrate that either the MetaBooster or the MetaBooster-HiFi pipeline drastically outperforms the state-of-the-art de novo metagenome assemblers, in terms of all relevant metagenome assembly criteria, involving genome fraction, contig length, and error rates.

6.

Strainline: full-length de novo viral haplotype reconstruction from noisy long reads.

Luo, Xiao; Kang, Xiongbin; Schönhuth, Alexander.

Genome Biol ; 23(1): 29, 2022 01 20.

Artigo em Inglês | MEDLINE | ID: mdl-35057847

RESUMO

Haplotype-resolved de novo assembly of highly diverse virus genomes is critical in prevention, control and treatment of viral diseases. Current methods either can handle only relatively accurate short read data, or collapse haplotype-specific variations into consensus sequence. Here, we present Strainline, a novel approach to assemble viral haplotypes from noisy long reads without a reference genome. Strainline is the first approach to provide strain-resolved, full-length de novo assemblies of viral quasispecies from noisy third-generation sequencing data. Benchmarking on simulated and real datasets of varying complexity and diversity confirm this novelty and demonstrate the superiority of Strainline.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Genoma Viral , Haplótipos , SARS-CoV-2/genética , Software , Benchmarking , COVID-19/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , SARS-CoV-2/classificação , Análise de Sequência de DNA

7.

Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility.

Kang, Xiongbin; Deng, Dong Mei; Crielaard, Wim; Brandt, Bernd W.

Front Cell Infect Microbiol ; 11: 720637, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34746021

RESUMO

High-throughput sequencing technology provides an efficient method for evaluating microbial ecology. Different bioinformatics pipelines can be used to convert 16S ribosomal RNA gene amplicon sequencing data into an operational taxonomic unit (OTU) table that is used to analyze microbial communities. It is important to assess the robustness of these pipelines, each with specific algorithms and/or parameters, and their influence on the outcome of statistical tests. Articles with publicly available datasets on the oral microbiome were searched for, and five datasets were retrieved. These were from studies on changes in microbiota related to smoking, oral cancer, caries, diabetes, or periodontitis. Next, the data was processed with four pipelines based on VSEARCH, USEARCH, mothur, and UNOISE3. OTU tables were rarefied, and differences in α-diversity and ß-diversity were tested for different groups in a dataset. Finally, these results were checked for consistency among these example pipelines. Of articles that deposited data, only 57% made all sequencing and metadata available. When processing the datasets, issues were encountered, caused by read characteristics and differences between tools and their defaults in combination with a lack of detail in the methodology of the articles. In general, the four mainstream pipelines provided similar results, but importantly, P-values sometimes differed between pipelines beyond the significance threshold. Our results indicated that for published articles, the description of bioinformatics methods and data deposition should be improved, and regarding reproducibility, that analysis of multiple subsamples is required when using rarefying as library-size normalization method.

Assuntos

Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Genes de RNAr , RNA Ribossômico 16S/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA

8.

phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.

Luo, Xiao; Kang, Xiongbin; Schönhuth, Alexander.

Genome Biol ; 22(1): 299, 2021 10 27.

Artigo em Inglês | MEDLINE | ID: mdl-34706745

RESUMO

Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly. However, current long-read assemblers are either reference based, so introduce biases, or fail to capture the haplotype diversity of diploid genomes. We present phasebook, a de novo approach for reconstructing the haplotypes of diploid genomes from long reads. phasebook outperforms other approaches in terms of haplotype coverage by large margins, in addition to achieving competitive performance in terms of assembly errors and assembly contiguity.

Assuntos

Diploide , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Genômica , Humanos , Sequenciamento por Nanoporos

9.

Isolation and whole genome sequencing of fetal cells from maternal blood towards the ultimate non-invasive prenatal testing.

Chen, Fang; Liu, Ping; Gu, Ying; Zhu, Zhu; Nanisetti, Amulya; Lan, Zhangzhang; Huang, Zhiwei; Liu, Jia Sophie; Kang, Xiongbin; Deng, Yuqing; Luo, Liqiong; Jiang, Dan; Qiu, Yong; Pan, Jianchang; Xia, Jun; Xiong, Ken; Liu, Chao; Xie, Lin; Shi, Qianyu; Li, Jing; Zhang, Xiuqing; Wang, Wei; Drmanac, Snezana; Bolund, Lars; Jiang, Hui; Drmanac, Radoje; Xu, Xun.

Prenat Diagn ; 37(13): 1311-1321, 2017 12.

Artigo em Inglês | MEDLINE | ID: mdl-29144536

RESUMO

OBJECTIVE: The purpose of this study were to develop a methodology of isolating fetal cells from maternal blood and use deep sequence demonstrating the promise for complete and accurate genetic screening compared to other non-invasive prenatal testing. METHODS: Here in this study, we developed a double negative selection (DNS) procedure to unbiasedly enrich fetal cells. After validated by short tandem repeat (STR), the isolated circulating fetal cells (CFCs) were subjected to deep whole genome sequencing analysis. RESULTS: Our DNS protocol significantly increasing the purity of the mimic fetal cells from 1 in 1 million nucleated cells in whole blood to 1:8 to 1:30 (12.5%-3.33%) after 2 steps of enrichment. Isolated single fetal cell obtained a coverage rate (86.8%) and allelic dropout rate (24.90%) comparative to the reported results of human cell line. Several disease-associated variants were identified in the whole genome sequencing data of isolated CFCs and further confirmed in the sequencing data of unamplified gDNA. CONCLUSION: In conclusion, the robustness of DNS and STR to collect CFCs from peripheral maternal blood for the first time coupled with deep sequencing technique demonstrates the possibility of comprehensive non-invasive prenatal testing of genetic disorders using isolated CFCs.

Assuntos

Separação Celular/métodos , Testes para Triagem do Soro Materno/métodos , Sequenciamento Completo do Genoma , Estudos de Viabilidade , Feminino , Humanos , Repetições de Microssatélites , Paternidade , Gravidez

10.

An Advanced Model to Precisely Estimate the Cell-Free Fetal DNA Concentration in Maternal Plasma.

Kang, Xiongbin; Xia, Jun; Wang, Yicong; Xu, Huixin; Jiang, Haojun; Xie, Weiwei; Chen, Fang; Zeng, Peng; Li, Xuchao; Xie, Yifan; Liu, Hongtai; Huang, Guodong; Chen, Dayang; Liu, Ping; Jiang, Hui; Zhang, Xiuqing.

PLoS One ; 11(9): e0161928, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27662469

RESUMO

BACKGROUND: With the speedy development of sequencing technologies, noninvasive prenatal testing (NIPT) has been widely applied in clinical practice for testing for fetal aneuploidy. The cell-free fetal DNA (cffDNA) concentration in maternal plasma is the most critical parameter for this technology because it affects the accuracy of NIPT-based sequencing for fetal trisomies 21, 18 and 13. Several approaches have been developed to calculate the cffDNA fraction of the total cell-free DNA in the maternal plasma. However, most approaches depend on specific single nucleotide polymorphism (SNP) allele information or are restricted to male fetuses. METHODS: In this study, we present an innovative method to accurately deduce the concentration of the cffDNA fraction using only maternal plasma DNA. SNPs were classified into four maternal-fetal genotype combinations and three boundaries were added to capture effective SNP loci in which the mother was homozygous and the fetus was heterozygous. The median value of the concentration of the fetal DNA fraction was estimated using the effective SNPs. A depth-bias correction was performed using simulated data and corresponding regression equations for adjustments when the depth of the sequencing data was below 100-fold or the cffDNA fraction is less than 10%. RESULTS: Using our approach, the median of the relative bias was 0.4% in 18 maternal plasma samples with a median sequencing depth of 125-fold. There was a significant association (r = 0.935) between our estimations and the estimations inferred from the Y chromosome. Furthermore, this approach could precisely estimate a cffDNA fraction as low as 3%, using only maternal plasma DNA at the targeted region with a sequencing depth of 65-fold. We also used PCR instead of parallel sequencing to calculate the cffDNA fraction. There was a significant association (r = 98.2%) between our estimations and those inferred from the Y chromosome.

11.

Complete mitochondrial genome of Drosophila albomicans.

Kang, Xiongbin; Luo, Xiao; Zhang, Zhi; Zhang, Zhen; Yang, Junqing; Bi, Guiqi.

Mitochondrial DNA A DNA Mapp Seq Anal ; 27(5): 3638-9, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-26358579

RESUMO

Drosophila albomicans has been widely used as an important animal model for chromosome evolution. In this study, the mitochondrial genome sequence of this species is determined and described for the first time. The mitochondrial genome (15 849 bp) encompasses two rRNA, 22 tRNA, and 13 protein-coding genes. Genome content and structure are similar to those reported from other Drosophila mitochondrial genomes. Phylogeny analysis indicates that D. albomicans have a closer genetic relationship with Drosophil aincompta and Drosophil alittoralis. This mitochondrial genome is potentially important for studying molecular evolution and conservation genetics in Drosophila genus.

Assuntos

Drosophila/genética , Genoma Mitocondrial , Animais , Proteínas de Drosophila/genética , Evolução Molecular , Genes de Insetos , Proteínas Mitocondriais/genética , Anotação de Sequência Molecular , Filogenia , RNA Ribossômico/genética , Sequenciamento Completo do Genoma

12.

Complete mitochondrial genome of the American flamingo, Phoenicopterus ruber (Phoenicopteriformes, Phoenicopteridae).

Luo, Xiao; Kang, Xiongbin; Zhang, Dongya.

Mitochondrial DNA A DNA Mapp Seq Anal ; 27(5): 3519-20, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-26260170

RESUMO

The American flamingo, Phoenicopterus ruber (P. ruber), is a large species of flamingo closely related to the greater flamingo and Chilean flamingo. In this paper, the complete mitochondrial genome sequence of P. ruber has been assembled for the first time. It was 17 476 bp in length and consisted of 13 typical vertebrate protein-coding genes, 22 tRNA genes, 2 rRNA genes and 2 control regions. COI and ND3 genes used GTG and ATC as start codons respectively, but the remaining protein-coding genes were encoded beginning with orthodox ATG codon. Two triplet codons (TAA, AGG) and one single T base were employed as stop codons. The arrangement of the overall genes and noncoding regions was identical to the same genus flamingo Phoenicopterus roseus. The AT content (54.27%) was higher than the GC content. Phylogenetic analysis was performed using 12 protein-coding genes, combined with other 11 species from the same Neognathae, which validated the responsibility and utility of this new mitochondrial genome.

Assuntos

Aves/genética , Genoma Mitocondrial , Mitocôndrias/genética , Análise de Sequência de DNA/métodos , Animais , Composição de Bases , DNA Ribossômico/genética , Ordem dos Genes , Tamanho do Genoma , Filogenia , RNA de Transferência/genética , Estados Unidos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA